1: Functions and vectors

BSTA 526: R Programming for Health Data Science

Author
Affiliation

Your name here - update this!!!!

OHSU-PSU School of Public Health

Published

January 8, 2026

Modified

January 7, 2026

1 Before you get started

Please save a copy of this as part1_FIRSTNAME_LASTNAME.qmd and work from that (where FIRSTNAME is your first name, and LASTNAME is your last name). This way, you’ll have the original as a reference just in case.

Also, the first time you try something, try to type out the answer rather than copying and pasting. It will help you understand what’s going on, because it forces you to read the code. However, if you find yourself getting too in the weeds with typing during class, copying and pasting works too! Practice typing on your own.

2 Welcome to R Programming!

This course introduces you to R in two parts:

Part 1 focuses on working through common tasks in data science:

  • importing data
  • wrangling data
  • visualizing data
  • summarizing data

Part 2 focuses on more advanced topics:

  • joining and merging data tables
  • automating analyses (purrr/for loops),

We might get to:

  • running basic statistical procedures
  • fancy tables in Quarto with kable, gt, and gtsummary
  • advanced Quarto topics

Throughout, we’ll work on concepts of reproducibility by utilizing RStudio project and Quarto document-based workflows as a way of reproducibly sharing our work.

3 What is R?

  • R is an open source statistical and programming computer language widely used for a variety of applications.
  • Why “R”??
    • Scheme inspired S (invented at Bell Labs in 1976) which inspired R (free and open source! in 1992)

4 Learning Objectives

By the end of this session, you should be able to:

  1. Work within the RStudio interface to run R code in a Quarto document
  2. Understand basic R syntax to use functions and assign values to objects
  3. Create and manipulate vectors and understand how R deals with missing data
  4. Install and load R packages

5 Introduction to R

5.1 R/RStudio

Link to video: https://youtu.be/oFmjHxl28H0

  • Winter 2025:
    • Start at minute 2. The first 2 minutes are on using RStudio server in the Cloud, which we are not using.
    • This video provides an overview of what the difference parts of the RStudio interface are used for.
    • Replace references to RMarkdown (or .Rmd) with Quarto (or .qmd).
vembedr::embed_youtube("oFmjHxl28H0", width = 600, height=300)
  • A good reference built into RStudio is Help -> Cheatsheets -> RStudio IDE cheat sheet

  • A nice summary of the RStudio anatomy is here:

Here are some useful gifs about customizing the RStudio panels

5.2 RStudio projects

How do you eat an elephant?

One bite at a time. We will go over topics related to RStudio/file management again and again this class, so don’t worry if it is confusing at first.

  • We will be using RStudio projects.
    • We will talk about them again later, but for now,
    • open RStudio by double clicking on the .Rproj file for part 1 (part_01.Rproj)

You may read this for more info, and watch this short video on creating new projects:

Link to video: https://youtu.be/D22THnoPA6w

vembedr::embed_youtube("D22THnoPA6w", width = 600, height=300)

5.3 Quarto (.qmd)

  • See intro to Quarto from BSTA 511 Week 1
    • Can view slides as html, pdf, or “continuous” webpage.

5.3.1 Create a Quarto file

5.3.2 Markdown for “word processing”

5.4 Code chunks

Link to video: https://youtu.be/0iETdE7WkqU

vembedr::embed_youtube("0iETdE7WkqU", width = 600, height=300)

The grey box below is a code chunk:

# basic math
4 + 5 
[1] 9
  • Everything that starts with a # is called a comment and is not code that runs. It is useful for making notes for yourself.
  • Below the comment is the actual code.
    • How do we run the code?

Try this one out. It’s the same code as above, but with no spaces. Does it still run?

# same code as above, without spaces
4+5
[1] 9

5.5 Useful keyboard shortcuts (Tools → Keyboard Shortcuts Help)

action mac windows/linux
Run code in qmd or script cmd + enter ctrl + enter
Add code chunk cmd + option + i ctrl + alt + i
<- option + - alt + -
interrupt currently running code esc esc
in console, go to previously run code up/down up/down
%>% cmd + shift + m ctrl + shift + m
search files cmd + shift + f ctrl + shift + f
render qmd cmd + shift + k ctrl + shift + k
run entire code chunk cmd + option + c ctrl + alt + c
keyboard shortcut help option + shift + k alt + shift + k

(see full list)

6 Using functions

Link to video: https://youtu.be/aQPOhhLinZM

vembedr::embed_youtube("aQPOhhLinZM", width = 600, height=300)

Below is an example of an R function:

# using a function: rounding numbers
round(3.14)
[1] 3
pi
[1] 3.141593
round(pi)
[1] 3

R functions can have multiple arguments

# using a function with more arguments
round(x = 3.14, digits = 1)
[1] 3.1

Do we have to “name” the arguments?

6.1 Getting Help

Learn more about the round() function with ?round:

?round
  • We can also type ?round in the Console instead of including it in a code chunk.
# can switch order of arguments (if you name them)
round(digits = 1, x = 3.14)
[1] 3.1

You may notice that boxes pop up as you type. These represent RStudio’s attempts to guess what you’re typing and share additional options.

There are many ways to get help. The more you learn how to get help, the easier your coding life will be. Here’s a list of options:

  • Google “question + rcran” (i.e “hist rcran” or “make a boxplot ggplot”)
  • Google error in quotes (i.e. “Evaluation error: invalid type (closure) for variable ‘***’”)
  • Search RStudio community (now called Posit)
  • Search Stack Overflow #r tag
  • Search github for your function name to see examples or search the error
  • Use generative AI (ChatGPT, Perplexity, etc.)

Post a question somewhere friendly:

6.2 Challenge 1

  • What does the function hist do?
    • What are its main arguments?
    • How did you determine this?
  • Tricky bonus: what about +, which is actually a function?

7 Common errors

7.1 “Object not found”

This happens when text is entered for a non-existent variable (object)

hello

Can be due to missing quotes

install.packages(dplyr)

or misspellings (R is case-sensitive)!

7.2 Incomplete commands

  • In the console:
    • When the console is waiting for a new command, the prompt line begins with >
      • If the console prompt is +, then a previous command is incomplete
      • You can finish typing the command in the console window
      • If stressed and confused, press ESC many times (ESC = ESCAPE ME OUT OF HERE)
  • In a code chunk:
    • R will let you know there is an error with a red circle containing a white X (see below).
      • Note that all code chunks below this one will still have the red error circles until you fix the code.
    • What happens if you try to run the code below?
3 + (2*6

Change #| eval: false above to #| eval: true after you fix the code error.

7.3 “could not find function”

  • This can happen when you are calling a function but haven’t loaded the package that it “lives” in.
  • For example, the function day() being used below is from the lubridate package.
    • What error do we get when we run the code?
day("2025-01-09")

How do we fix this code?

# either specify the package in front of ::function()
lubridate::day("2025-01-09")
[1] 9
# or load the package first (preferably at beginning of script)
library(lubridate)

Attaching package: 'lubridate'
The following object is masked from 'package:vembedr':

    hms
The following objects are masked from 'package:base':

    date, intersect, setdiff, union
day("2025-01-09")
[1] 9

Or, maybe there was a misspelling…

dsy("2025-01-09")

8 Assigning objects with <-

Link to video: https://youtu.be/pW9wkwob1Es

vembedr::embed_youtube("pW9wkwob1Es", width = 600, height=300)
  • <- is the primary assignment operator in R
  • Some naming conventions in R
    • Objects cannot start with a number
    • Object names are case sensitive
    • No spaces in object names
# assigning value to an object
weight_kg <- 55
  • Now that the object has been assigned, we can reference that object by running its name:
# recall object
weight_kg
[1] 55
  • We can also use the object as a variable:
# multiple an object (convert kg to lb)
2.2 * weight_kg
[1] 121
  • We can create a new object (variable) based on the existing one:
# assign weight conversion to object
weight_lb <- 2.2 * weight_kg
  • Note that the code above only saves the value for weight_lb, but it doesn’t show us what the value is.
  • To see what the value is, you can
    • Check the Environment tab (this is not reproducible though)
    • Add () around the whole line of code to also see the value:
# added parentheses to see value of weight_lb in output
(weight_lb <- 2.2 * weight_kg)
[1] 121
  • Below we assign a new value to weight_kg
    • Did this change the value of weight_lb?
# reassign new value to an object
weight_kg <- 100

You can think of the names of objects like sticky notes. You have the option to place the sticky note (name) on any value you choose. You can pick up the sticky note and place it on another value, but you need to explicitly tell R when you want values assigned to certain objects.

8.1 Removing objects

  • You can clear the entire environment using the button at the top of the Environment panel with a picture of a broom.
    • This may seem extreme, but don’t worry! We can re-create all the work we’ve already done by running each line of code again.
  • To remove an individual object, use the remove() function:
# remove object
remove(weight_lb) 

8.2 Challenge 2

What is the value of each item at each step? (Hint, you can see the value of an object by typing in the name of the object, such as with the mass line below.)

mass <- 47.5            # 1. mass?
mass
[1] 47.5
width  <- 122             # 2. width?
mass <- mass * 2.0      # 3. mass?
width  <- width - 20        #4.  width?
mass_index <- mass/width  # 5. mass_index?

Make your answers here:

9 Vectors

Link to video: https://youtu.be/0qLgfpvzBqI

vembedr::embed_youtube("0qLgfpvzBqI", width = 600, height=300)

9.1 Creating vectors

  • c is for combine or concatenate
# assign vector
ages <- c(50, 55, 60, 65) 

# recall vector
ages
[1] 50 55 60 65

9.2 Learning things about vectors

# how many things are in the object?
length(ages)
[1] 4
# what type of object?
class(ages)
[1] "numeric"
# performing functions with vectors
mean(ages)
[1] 57.5
range(ages)
[1] 50 65

9.3 Character vectors

# vector of body parts
organs <- c("lung", "prostate", "breast")

In the example above, each word within the vector is encased in quotation marks, indicating these are character data, rather than object names.

9.4 Challenge 3

Please answer the following questions about organs:

  1. How many values are in organs?
  2. What type of object is organs?

Answers here:

10 Object (data) types and Vectors

  • character: sometimes referred to as string data, tend to be surrounded by quotes
  • numeric: real numbers (decimals), sometimes referred to as “double”
  • integer: a subset of numeric in which numbers are stored as integers
  • logical: Boolean data (TRUE and FALSE)
  • dates: can save data as seconds, hours, days, months, years, or combinations thereof. Recommend lubridate package for this.
  • complex: complex numbers with real and imaginary parts (e.g., 1 + 4i)
  • raw: bytes of data (machine readable, but not human readable)

10.1 Challenge 4

  • R tends to handle interpreting data types in the background of most operations.
  • The following code is designed to cause some unexpected results in R.
    • What is unusual about each of the following objects?
num_char <- c(1, 2, 3, "a")
num_logical <- c(1, 2, 3, TRUE)
char_logical <- c("a", "b", "c", TRUE)
tricky <- c(1, 2, 3, "4")
hola <- c("hi", "guten tag", hello)

11 Manipulating vectors

Link to video: https://youtu.be/J0y8Dtvm7bQ

vembedr::embed_youtube("J0y8Dtvm7bQ",  width = 600, height=300)

11.1 Adding values to vectors

ages
[1] 50 55 60 65
# add a value to end of vector
(ages <- c(ages, 90) )
[1] 50 55 60 65 90
# add value at the beginning
(ages <- c(30, ages))
[1] 30 50 55 60 65 90

11.2 Extracting (or excluding) values from vectors

# extracting second value
organs[2] 
[1] "prostate"
# excluding second value
organs[-2] 
[1] "lung"   "breast"
# extracting first and third values
organs[c(1, 3)] 
[1] "lung"   "breast"

12 Missing data

vembedr::embed_youtube("r8RFoTXDs_U")
  • NA indicates a missing value in R.
  • NA is not a character!!!
# create a vector with missing data
heights <- c(2, 4, 4, NA, 6)
  • What happens when we try to calculate the mean or max of a vector with missing data?
# calculate mean and max on vector with missing data
mean(heights)
[1] NA
max(heights)
[1] NA
  • How do we fix this?
# add argument to remove NA
mean(heights, na.rm = TRUE)
[1] 4
max(heights, na.rm = TRUE)
[1] 6
  • Or, can use na.omit - be careful with this!!
# remove incomplete cases
na.omit(heights) 
[1] 2 4 4 6
attr(,"na.action")
[1] 4
attr(,"class")
[1] "omit"
mean(na.omit(heights))
[1] 4

12.1 Challenge 5

Complete the following tasks after creating this vector (Note: there are multiple solutions):

  1. Remove NAs on more_heights (assign it to the object more_heights_complete)
  2. Calculate the median() of more_heights_complete
# create vector
more_heights <- c(63, 69, 60, 65, NA, 68, 61, 70, 61, 59, 64, 69, 63, 63, NA, 72, 65, 64, 70, 63, 65)
# remove NAs


# calculate the median

13 Vectorization

  • Most of R’s functions are “vectorized”
  • This means that the function will operate on all elements of a vector without needing to use other advanced programming tools such as for loops (more on that later).

We can see this when we try to add vectors together:

x <- 1:4
y <- 6:9
z <- x + y
z
[1]  7  9 11 13

All mathematical and logical operators are vectorized functions:

z^2
[1]  49  81 121 169
z + 1
[1]  8 10 12 14
z == 9
[1] FALSE  TRUE FALSE FALSE
z > 9
[1] FALSE FALSE  TRUE  TRUE
x / y
[1] 0.1666667 0.2857143 0.3750000 0.4444444

But other common functions are as well:

z <- x / y
round(z, 2)
[1] 0.17 0.29 0.38 0.44
z <- c("no", "nope", "maybe")
paste(z, "hi")
[1] "no hi"    "nope hi"  "maybe hi"
stringr::str_replace(z, "o","7")
[1] "n7"    "n7pe"  "maybe"

14 R packages

  • Packages are add-ons that contain functions and/or data.
  • Usually the functions in a package are related to a certain type of data task or analysis method.
  • You only need to install packages once.
  • You need to “load” the packages that you need for your code
    • every time you start R AND
    • you need to have the code to load them at the top of your qmd or R script.

14.1 Loading packages: library()orpacman::p_load()`

  • You can load packages with the library() function or p_load() function in the pacman package.

  • The following code loads two packages, though the tidyverse package is actually a suite of many packages.

    • This code assumes you have already installed the packages!!!
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr   1.1.4     ✔ readr   2.1.5
✔ forcats 1.0.0     ✔ tibble  3.2.1
✔ ggplot2 3.5.1     ✔ tidyr   1.3.1
✔ purrr   1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()  masks stats::filter()
✖ lubridate::hms() masks vembedr::hms()
✖ dplyr::lag()     masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
# OR do this:
pacman::p_load(tidyverse, janitor) 

15 Wrapping up

Today we covered

  • R/RStudio and Quarto
  • Functions
  • Working with objects (vectors) and determining data types
  • Missing data
  • Vectorization
  • R Packages

16 Post Class Survey

Please fill out the post-class survey. I will summarize muddiest and clearest points before each class. Your responses are anonymous in that I separate your names from the survey answers before compiling/reading.

17 Acknowledgements

  • This Intro to R was copied from the BSTA 504 Winter 2023 course, taught by Jessica Minnier. I made minor modifications; primarily to update the material from RMarkdown to Quarto, and adding links to an introduction to Quarto from BSTA 511/611.
  • Minnier’s Acknowledgements: